R Taghizadeh-Mehrjardi1; F Sarmadian; A. A Zolfaghari; A. Jafari
Abstract
Introduction: Cation exchange capacity (CEC) has long been input parameter of many environmental models (Manrique et al., 1991). Added to this, CEC data can give more clear and complete interpretation of soil, plant nutrition process and consequently fertilizer and soil amendment requirements. Laboratory ...
Read More
Introduction: Cation exchange capacity (CEC) has long been input parameter of many environmental models (Manrique et al., 1991). Added to this, CEC data can give more clear and complete interpretation of soil, plant nutrition process and consequently fertilizer and soil amendment requirements. Laboratory analysis is the most accurate method for direct measurement of CEC. However, direct measurement of CEC is difficult, particularly in the soils of arid and semi-arid regions of Iran, due to large amounts of calcium carbonate that makes measuring expensive, laborious, and time-consuming (Amini et al., 2005). It can be an appropriate approach to predict CEC from readily available properties via developing nonparametric or parametric methods (Minasny et al., 1999). Therefore, the objectives of this study were to compare and apply different data mining approches including multi-linear regression (MLR), multi-nonlinear regression (MNR), cascade neural network (CNN), two radial base functions (RBF), multi-layer perceptron neural network (MLP), and adaptive neuro-fuzzy inference system (ANFIS) to estimate cation exchange capacity in different soils of Iran. Materials and Methods: For this purpose, 1770 soil samples were selected from different sites in Iran from which 356 samples were used as the testing data, and the remaining 1414 soils were employed as the training. The soil samples were dried, crushed and passed through a 2 mm sieve to prepare for physical and chemical analyses. The percentages of sand (50 -2000 mμ), silt (2-50 mμ) and clay (<2μm) were determined using the hydrometer method according to USDA soil textural classification system. The soil organic carbon was determined using Walkly-Black method and the CEC was measured by the standard method. Then the data mining techniques (i.e. MLR, MNR, CNN, RBF, MLP, ANFIS) were applied to predict CEC from readily available data (i.e. soil organic carbon and clay percentages). Finally, to compare efficiencies of these techniques, different error criteria including root mean square error (RMSE), mean error (ME), coefficient of determination (R2) and relative improvement (RI) were applied. In the present research, an effort was made to calculate the uncertainty of pedotransfer functions using Monte Carlo technique. Results and Discussion: Statistical analyses indicated the soil organic matter and soil texture have the highest variation. For example, variation of SOM has ranged from 0.01 to 2.94. Investigation of correlation coefficients shows that CEC is more related to the parameters, clay and soil organic matter content. Thus, the parameters, clay, silt, sand and organic carbon content were the input independent variables (readily available properties), and the CEC was an output dependent variable in this study. Root mean square error (RMSE) of linear and nonlinear regression was 4.74 and 4.71 meq 100g-1, respectively. This indicates that both methods are able to properly and equally predict CEC. Nonlinear recession equation increased the accuracy of prediction by 0.6 %. Results show that nonparametric artificial neural networks do not increase the accuracy of prediction CEC, significantly. The best result of neural networks was obtained using MLP. Nonparametric regression tree accuracy was slightly better than artificial neural network methods (4.53 and 4.61 meq 100g-1, respectively). The best method for prediction of CEC was ANFIS (RMSE=4.02 meq 100g-1). The accuracy of prediction using this method was 15 % more than linear regression. Moreover, the ANFIS model on the partitioned data by fuzzy k-means cloud enhances the prediction accuracy up to 26%. Monte Carlo results indicate the highest and lowest uncertainty belongs to MLR and ANFIS models, respectively. Conclusion: In the present research, different data mining techniques were applied to predict CEC in various ranges of soils. The data base related to 1770 soil samples was gathered from all over Iran. Results of the comparison indicate the highest prediction accuracy belongs to ANFIS model. Moreover, partitioning the data base to four groups enhances the accuracy of models. This result confirms that pedotransfer functions are more reliable only on the range of existing data. Overall, our efforts resulted only in R2 of 0.58. This means that soil organic matter and clay percentage could only model the 58% CEC variation. This suggests we should incorporate more input data including kind of clay mineral, percentage of calcium carbonate, gypsum, and etc.